AITopics | relative loss

Collaborating Authors

relative loss

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Group Relative Knowledge Distillation: Learning from Teacher's Relational Inductive Bias

Li, Chao, Zhou, Changhua, Chen, Jia

arXiv.org Artificial IntelligenceApr-30-2025

Knowledge distillation typically transfers knowledge from a teacher model to a student model by minimizing differences between their output distributions. However, existing distillation approaches largely focus on mimicking absolute probabilities and neglect the valuable relational inductive biases embedded in the teacher's relative predictions, leading to exposure bias. In this paper, we propose Group Relative Knowledge Distillation (GRKD), a novel framework that distills teacher knowledge by learning the relative ranking among classes, rather than directly fitting the absolute distribution. Specifically, we introduce a group relative loss that encourages the student model to preserve the pairwise preference orderings provided by the teacher's outputs. Extensive experiments on classification benchmarks demonstrate that GRKD achieves superior generalization compared to existing methods, especially in tasks requiring fine-grained class differentiation. Our method provides a new perspective on exploiting teacher knowledge, focusing on relational structure rather than absolute likelihood.

arxiv preprint arxiv, large language model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2504.20482

Genre: Research Report (0.64)

Industry: Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

The Surprising Benefits of Base Rate Neglect in Robust Aggregation

Kong, Yuqing, Wang, Shu, Wang, Ying

arXiv.org Artificial IntelligenceJun-19-2024

Robust aggregation integrates predictions from multiple experts without knowledge of the experts' information structures. Prior work assumes experts are Bayesian, providing predictions as perfect posteriors based on their signals. However, real-world experts often deviate systematically from Bayesian reasoning. Our work considers experts who tend to ignore the base rate. We find that a certain degree of base rate neglect helps with robust forecast aggregation. Specifically, we consider a forecast aggregation problem with two experts who each predict a binary world state after observing private signals. Unlike previous work, we model experts exhibiting base rate neglect, where they incorporate the base rate information to degree $\lambda\in[0,1]$, with $\lambda=0$ indicating complete ignorance and $\lambda=1$ perfect Bayesian updating. To evaluate aggregators' performance, we adopt Arieli et al. (2018)'s worst-case regret model, which measures the maximum regret across the set of considered information structures compared to an omniscient benchmark. Our results reveal the surprising V-shape of regret as a function of $\lambda$. That is, predictions with an intermediate incorporating degree of base rate $\lambda<1$ can counter-intuitively lead to lower regret than perfect Bayesian posteriors with $\lambda=1$. We additionally propose a new aggregator with low regret robust to unknown $\lambda$. Finally, we conduct an empirical study to test the base rate neglect model and evaluate the performance of various aggregators.

aggregator, information structure, prediction, (11 more...)

arXiv.org Artificial Intelligence

2406.1349

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Oregon (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback

Information loss from dimensionality reduction in 5D-Gaussian spectral data

Schelle, A., Lüling, H.

arXiv.org Artificial IntelligenceDec-23-2023

Understanding the loss of information in spectral analytics is a crucial first step towards finding root causes for failures and uncertainties using spectral data in artificial intelligence models built from modern complex data science applications. Here, we show from an elementary Shannon entropy model analysis with quantum statistics of Gaussian distributed spectral data, that the relative loss of information from dimensionality reduction due to the projection of an initial five-dimensional dataset onto two-dimensional diagrams is less than one percent in the parameter range of small data sets with sample sizes on the order of few hundred data samples. From our analysis, we also conclude that the density and expectation value of the entropy probability distribution increases with the sample number and sample size using artificial data models derived from random sampling Monte Carlo simulation methods.

dimensionality reduction, shannon entropy, spectral data, (13 more...)

arXiv.org Artificial Intelligence

2301.11923

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
Europe > Germany > Thuringia > Erfurt (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.63)

Add feedback

Maximum Optimality Margin: A Unified Approach for Contextual Linear Programming and Inverse Linear Programming

Sun, Chunlin, Liu, Shang, Li, Xiaocheng

arXiv.org Artificial IntelligenceMay-28-2023

The predict-then-optimize problem considers a learning problem under a decision making context where the output of a machine learning model serves as the input of a downstream optimization problem (e.g. a linear program). The ultimate goal of the learner is to prescribe a decision/solution for the downstream optimization problem using directly the input (variables) of the machine learning model but without full observation of the input of the optimization problem. A similar problem formulation was also studied as prescriptive analytics (Bertsimas and Kallus, 2020) and contextual linear programming (Hu et al., 2022). While Elmachtoub and Grigas (2022) justifies the importance of leveraging the optimization problem structure when building the machine learning model, the existing efforts on exploiting the optimization structure have been largely inadequate. In this paper, we delve deeper into the structural properties of the optimization problem and propose a new approach called maximum optimality margin which builds a max-margin learning model based on the optimality condition of the downstream optimization problem. More importantly, our approach only needs the observations of the optimal solution in the training data rather than the objective function, thus it draws an interesting connection to the inverse optimization problem. The connection gives a new shared perspective on both the predict-then-optimize problem and the inverse optimization problem, and our analysis reveals a scale inconsistency issue that arises practically and theoretically for many existing methods.

artificial intelligence, machine learning, optimal solution, (17 more...)

arXiv.org Artificial Intelligence

2301.1126

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.81)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Linear Hinge Loss and Average Margin

Neural Information Processing SystemsApr-6-2023, 17:34:10 GMT

We describe a unifying method for proving relative loss bounds for on(cid:173) line linear threshold classification algorithms, such as the Perceptron and the Winnow algorithms. For classification problems the discrete loss is used, i.e., the total number of prediction mistakes. We introduce a con(cid:173) tinuous loss function, called the "linear hinge loss", that can be employed to derive the updates of the algorithms. We first prove bounds w.r.t. the linear hinge loss and then convert them to the discrete loss. We show how relative loss bounds based on the linear hinge loss can be converted to relative loss bounds i.t.o. the discrete loss using the average margin.

hinge loss and average margin, linear hinge loss, relative loss, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection

Neural Information Processing SystemsApr-6-2023, 15:48:49 GMT

kernel matrix, matrix, positive definite matrix, (15 more...)

Neural Information Processing Systems

Country: Oceania > Australia > Australian Capital Territory > Canberra (0.04)

Genre: Instructional Material > Online (0.71)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.51)

Add feedback

Decentralized Feature-Distributed Optimization for Generalized Linear Models

Ancelin, Brighton, Bahmani, Sohail, Romberg, Justin

arXiv.org Machine LearningOct-28-2021

We consider the "all-for-one" decentralized learning problem for generalized linear models. The features of each sample are partitioned among several collaborating agents in a connected network, but only one agent observes the response variables. To solve the regularized empirical risk minimization in this distributed setting, we apply the Chambolle--Pock primal--dual algorithm to an equivalent saddle-point formulation of the problem. The primal and dual iterations are either in closed-form or reduce to coordinate-wise minimization of scalar convex functions. We establish convergence rates for the empirical risk minimization under two different assumptions on the loss function (Lipschitz and square root Lipschitz), and show how they depend on the characteristics of the design matrix and the Laplacian of the network.

agent, algorithm, optimization, (17 more...)

arXiv.org Machine Learning

2110.15283

Country:

North America > United States > Virginia (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Florida > Orange County > Orlando (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.47)

Add feedback

Stochastic Bandits with Vector Losses: Minimizing $\ell^\infty$-Norm of Relative Losses

Shang, Xuedong, Shao, Han, Qian, Jian

arXiv.org Machine LearningOct-15-2020

Multi-armed bandits are widely applied in scenarios like recommender systems, for which the goal is to maximize the click rate. However, more factors should be considered, e.g., user stickiness, user growth rate, user experience assessment, etc. In this paper, we model this situation as a problem of $K$-armed bandit with multiple losses. We define relative loss vector of an arm where the $i$-th entry compares the arm and the optimal arm with respect to the $i$-th loss. We study two goals: (a) finding the arm with the minimum $\ell^\infty$-norm of relative losses with a given confidence level (which refers to fixed-confidence best-arm identification); (b) minimizing the $\ell^\infty$-norm of cumulative relative losses (which refers to regret minimization). For goal (a), we derive a problem-dependent sample complexity lower bound and discuss how to achieve matching algorithms. For goal (b), we provide a regret lower bound of $\Omega(T^{2/3})$ and provide a matching algorithm.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2010.08061

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.69)

Add feedback

Relative Loss Bounds for On-line Density Estimation with the Exponential Family of Distributions

Azoury, Katy S., Warmuth, Manfred K.

arXiv.org Machine LearningJan-23-2013

We consider on-line density estimation with a parameterized density from the exponential family. The on-line algorithm receives one example at a time and maintains a parameter that is essentially an average of the past examples. After receiving an example the algorithm incurs a loss which is the negative log-likelihood of the example w.r.t. the past parameter of the algorithm. An off-line algorithm can choose the best parameter based on all the examples. We prove bounds on the additional total loss of the on-line algorithm over the total loss of the off-line algorithm. These relative loss bounds hold for an arbitrary sequence of examples. The goal is to design algorithms with the best possible relative loss bounds. We use a certain divergence to derive and analyze the algorithms. This divergence is a relative entropy between two exponential distributions.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1301.6677

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.63)

Add feedback

Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection

Tsuda, Koji, Rätsch, Gunnar, Warmuth, Manfred K. K.

Neural Information Processing SystemsDec-31-2005

We address the problem of learning a symmetric positive definite matrix. The central issue is to design parameter updates that preserve positive definiteness. Our updates are motivated with the von Neumann divergence. Rather than treating the most general case, we focus on two key applications that exemplify our methods: Online learning with a simple square loss and finding a symmetric positive definite matrix subject to symmetric linear constraints. The updates generalize the Exponentiated Gradient (EG) update and AdaBoost, respectively: the parameter is now a symmetric positive definite matrix of trace one instead of a probability vector (which in this context is a diagonal positive definite matrix with trace one). The generalized updates use matrix logarithms and exponentials to preserve positive definiteness. Most importantly, we show how the analysis of each algorithm generalizes to the non-diagonal case. We apply both new algorithms, called the Matrix Exponentiated Gradient (MEG) update and DefiniteBoost, to learn a kernel matrix from distance measurements.

divergence, matrix, positive definite matrix, (14 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(6 more...)

Genre: Instructional Material > Online (0.50)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.51)

Add feedback